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Abstract 

Background: Plants respond to diverse environmental cues including microbial perturbations by coordinated 
regulation of thousands of genes. These intricate transcriptional regulatory interactions depend on the recognition 
of specific promoter sequences by regulatory transcription factors. The combinatorial and cooperative action of 
multiple transcription factors defines a regulatory network that enables plant cells to respond to distinct biological 
signals. The identification of immune-related modules in large-scale transcriptional regulatory networks can reveal 
the mechanisms by which exposure to a pathogen elicits a precise phenotypic immune response. 

Results: We have generated a large-scale immune co-expression network using a comprehensive set of Arabidopsis 
thaliana (hereafter Arabidopsis) transcriptomic data, which consists of a wide spectrum of immune responses to 
pathogens or pathogen-mimicking stimuli treatments. We employed both linear and non-linear models to generate 
Arabidopsis immune co-expression regulatory (AICR) network. We computed network topological properties and 
ascertained that this newly constructed immune network is densely connected, possesses hubs, exhibits high modularity, 
and displays hallmarks of a "real" biological network. We partitioned the network and identified 156 novel modules 
related to immune functions. Gene Ontology (GO) enrichment analyses provided insight into the key biological 
processes involved in determining finely tuned immune responses. We also developed novel software called 
OCCEAN (One Click C/s-regulatory elements /4/Valysis) to discover statistically enriched promoter elements in the 
upstream regulatory regions of Arabidopsis at a whole genome level. We demonstrated that OCCEAN exhibits 
higher precision than the existing promoter element discovery tools. In light of known and newly discovered 
c/s-regulatory elements, we evaluated biological significance of two key immune-related functional modules and 
proposed mechanism(s) to explain how large sets of diverse GO genes coherently function to mount effective 
immune responses. 

Conclusions: We used a network-based, top-down approach to discover immune-related modules from transcriptomic 
data in Arabidopsis. Detailed analyses of these functional modules reveal new insight into the topological properties of 
immune co-expression networks and a comprehensive understanding of multifaceted plant defense responses. We 
present evidence that our newly developed software, OCCEAN, could become a popular tool for the Arabidopsis 
research community as well as potentially expand to analyze other eukaryotic genomes. 
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Genomics 



* Correspondence: smukhtar@uab.edu 

department of Biology, University of Alabama at Birmingham, Birmingham, 
AL, 35294-1170, USA 

4 Nutrition Obesity Research Center, University of Alabama at Birmingham, 
Birmingham, AL, 35294, USA 

Full list of author information is available at the end of the article 

O© 2014 Tully et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative 
BlOlVlGCl CGntr3l Commons Attribution License (http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and 
reproduction in any medium, provided the original work is properly credited. 



Tully et al. BMC Genomics 2014, 15:421 
http://www.biomedcentral.com/1471-2164/15/421 



Page 2 of 14 



Background 

Agriculture, in struggling to meet global food demands 
of a rapidly growing population, has been plagued with 
plant diseases that thwart crop production and account 
for a global annual average yield loss of 16 percent [1-3]. 
In light of the importance of agriculture to humans, it is 
essential to elucidate and understand the mechanisms of 
plant immunity at the molecular level. As with any host- 
pathogen conflict, plants and their disease agents are in 
an evolutionary arms race. When the host mounts a 
defense reaction, the pathogen develops new strategies 
to evade the defensive mechanisms, which causes the 
continuation of this cycle, ad infinitum [4,5]. Our present 
understanding of plant immune systems reveals two pri- 
mary means by which plants recognize their invaders. The 
recognition of Microbial-Associated Molecular Patterns 
(MAMPs) is the first line of defense and its associated 
MAMPs-Triggered Immunity (MTI) is highly efficient at 
repelling most pathogens. If pathogens are able to breach 
this first line of defense through the production of effector 
proteins, plants deploy Effector-Triggered Immunity (ETI) 
[4,6]. Both modes of defense cause massive transcriptional 
reprogramming which involves complex signal transduc- 
tion networks and cross-talk that is mediated by plant 
hormones, including: Salicylic Acid (SA), Jasmonic Acid 
(JA) and Ethylene (ET) [7,8]. The Arabidopsis genome 
encodes approximately 1,922 transcription factors (TFs) 
that are implicated in diverse biological processes in- 
cluding the regulation of immune signaling pathways 
[9-11]. However, the global organization of TF-DNA 
interactions remains elusive. Moreover, what remains 
mostly unknown are the precise mechanisms by which 
the plant cell integrates innumerable synergistic and an- 
tagonistic immune transcriptomic signals to orchestrate 
fine-tuned and pathogen-specific defense responses. 

In recent years, systems biology approaches, specific- 
ally network analyses that integrate experimentally de- 
rived information with computational modeling, have 
emerged as powerful tools for studying complex traits in 
diverse species [5,12-15]. Network biology provides com- 
prehensive analyses of the system's components (nodes) 
and the relationships among them (edges). In addition, 
the analyses of the topological properties of the network 
can provide further understanding of the hierarchical 
organization of a complex biological system and contrib- 
ute to the overall interpretation of biological complexity. 
In any eukaryotic cell, thousands of genes and their 
products orchestrate their transcriptional and translational 
activities to ensure the proper execution of cellular func- 
tions [13,15,16]. It has become evident that biological 
functions can be accomplished by functional modules that 
are embedded within the interaction networks including 
transcriptional gene regulatory networks [17,18]. Compre- 
hensive high-throughput analyses of gene expression can 



allow for the identification of gene clusters that are 
highly correlated in expression levels across multiple 
samples in any given cellular state [13,16]. Generally, it 
is thought that genes in the same co-expression sub- 
network are often enriched with similar functional 
annotations. Additionally, the metric to measure the 
co-expression falls into one of two major categories: 
correlation coefficients or mutual information measures 
[19]. Finally, finding common cw-regulatory elements 
(transcription factor binding sites) can aid in the identifi- 
cation of co-regulated gene clusters and characterization 
of transcriptional regulatory networks [13,14,17]. 

In the current study we employed a systems biology 
approach and report the construction of a large-scale 
Arabidopsis immune co-expression regulatory (AICR) net- 
work. We tested five diverse algorithms, i.e. PCC (Pearson 
Correlation Coefficient), ARACNE multiplicative (Algorithm 
for the Reconstruction of Accurate Cellular Networks), 
ARACNE additive, CLR (Context Likelihood of Related- 
ness), and MRNET (Minimum Redundancy NETwork) 
with different thresholds, which yielded 15 pairs of experi- 
mental networks along with their respective random net- 
works [19-25]. We employed network biology analyses 
and determined that ARACNE multiplicative network 
(5,147 nodes and 38,610 edges), with threshold of 0.8 
exhibits properties of a "true" network as it possesses 
the scale-freeness attribute (degree distribution follows 
a power law). Next, we partitioned the AICR network 
and predicted 156 functional modules containing at 
least six members with the largest module encompass- 
ing 178 nodes. Subsequently, we analyzed functional 
annotations of genes within each module and calculated 
enrichment of specific Gene Ontology terms to evaluate 
the biological significance of functional modules. To es- 
tablish a causal relationship between co-expression and 
co-regulation, we sought to identify common cw-regula- 
tory elements. First we developed a new, comprehensive 
software interface for c/s-regulatory elements discovery in 
Arabidopsis (named OCCEAN - One Click Cw-regulatory 
Elements ANalysis). We demonstrated that OCCEAN 
boasts higher capacity (can process the entire genome 
scale; over 30 million characters in a single run) and 
features higher precision than MEME (Multiple EM 
for Motif Elicitation). We identified several statistically 
enriched novel tis-regulatory elements in our dataset. 
Finally, we evaluated and discussed key immune-related 
functional modules. 

Results and discussion 

Construction of the Arabidopsis immune co-expression 
regulatory (AICR) network 

To generate a large-scale immune co-expression net- 
work, we selected Arabidopsis transcriptomic data that 
encompassed the broadest possible spectrum of immune 
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responses to pathogens or pathogen-mimicking stimuli 
treatments [26-34]. Previously, we have employed the 
same set of experiments to define transcriptional re- 
sponses of genes that encode the proteins identified in 
plant-pathogen immune network, version 1' (PPIN-1) [5] 
(Additional file 1: Table SI and Additional file 2: Support- 
ing methods). We compiled the lists of probes showing 
significant up- or down-regulation and discovered 8,377 
differentially expressed genes, which subsequently were 
used to build a comprehensive immune co-expression 
network (Additional file 3: Table S2). In a co-expression 
network, the nodes represent the genes and the edges 
(lines connecting nodes) represent the similarity/related- 
ness between the genes, which is generally measured by 
the Pearson correlation coefficient (PCC) [18]. PCC and 
other measures of association determine the degree of lin- 
ear dependencies between the variables [19]. While these 
association models have obvious statistical advantages, 
they lack the ability to capture non-linear relationships 
[35]. In contrast, mutual information (MI) is a non-linear 
statistic that provides an attractive alternative to measure 
biologically significant non-linear relationships [36]. We 
used both linear and non-linear models to generate Arabi- 
dopsis immune co-expression regulatory (AICR) network. 
To measure linear associations between two genes, we cal- 
culated the PCC values for all of the genes in a matrix of 
8,377 x 8,377 (over 70 million combinations). Then, we 
employed both a similarity threshold algorithm (a prede- 
fined single cut-off value) as well as a newly developed 
and parameter-free algorithm termed mutual ^-nearest 
neighbor (mKNN) to process the calculated PCC values 
[18]. mKNN was recently shown to possess several advan- 
tages over the commonly used similarity threshold algo- 
rithm [18]. Using the linear model, we generated seven 
pairs of experimental networks along with their respective 
random networks, i.e., PCC (0.9), K3, K10, K20, K50, 
K100 and K250 (Table 1). To infer non-linear association 
between immune-related genes, we employed four recently 
used MI methods: ARACNE multiplicative (Algorithm for 
the Reconstruction of Accurate Cellular Networks), ARA- 
CNE additive, CLR (Context Likelihood of Relatedness), 
and MRNET (Minimum Redundancy NETwork) with 
0.8 and 0.9 thresholds. Specifically, we employed the 
'parmigene' package (PARallel Mutual Information esti- 
mation for GEne NEtwork reconstruction, an R package) to 
construct eight Additional experimental networks [21-25]. 
It has been recently shown that the MI estimator im- 
plemented in parmigene provides more precision and 
unbiased results compared with previous MI estimators 
(Figure 1). Our approach to initially build multiple ex- 
perimental and random networks using both linear and 
non-linear models is aimed to determine an optimal ex- 
perimental network that displays topological properties 
of a "real" biological network [15]. 



Table 1 Comparison of multiple algorithms used to 
generate Arabidopsis Immune Co-expression (AICR) 
Network 



Algorithm 


Threshold 


Nodes 


Edges 


PCC K = 3 


Top 3 


8250 


1371822 


PCC K= 10 


Top 10 


8336 


1374677 


PCC K = 20 


Top 20 


8358 


1379968 


PCC K = 50 


Top 50 


8369 


1396874 


PCC K = 1 00 


Top 1 00 


8373 


1453118 


PCC K = 250 


Top 250 


8377 


1758775 


PCC 


90.00% 


8377 


2755191 


ARACNE additive 


90.00% 


1999 


10253 


ARACNE multiplicative 


90.00% 


1999 


10253 


MRNET 


90.00% 


1999 


3493 


CLR 


90.00% 


12 


11 


ARACNE additive 


80.00% 


5147 


32972 


ARACNE multiplicative 


80.00% 


5147 


38610 


MRNET 


80.00% 


5058 


6049 


CLR 


80.00% 


5/ 


60 



PCC, pearson correlation coefficients; ARACNE, algorithm for the reconstruction 
of accurate cellular networks; CLR, context likelihood of relatedness; MRNET, 
minimum redundancy NETwork. 



Global topological properties of the AICR network 

Network topology refers to the arrangement or pattern 
of interactions within a network. The majority of natur- 
ally occurring networks, including biological networks, 
maintain certain topological characteristics in terms of 
their structure and organization that are significantly dif- 
ferent from random networks [15,37,38]. Therefore, in 
an initial experiment, we computed degree of distribu- 
tion for all 15 pairs of experimental networks along with 
their respective random networks. We found that two 
networks, ARACNE multiplicative (5,147 nodes and 
38,610 edges) and ARACNE additive (5,147 nodes and 
32,972 edges), with threshold of 0.8 exhibit the scale- 
freeness attribute (degree distribution follows a power 
law), when compared with all random networks and other 
experimental networks (Figure 2). Since the precision of 
parmigene-based ARACNE multiplicative is better than 
ARACNE additive, we selected ARACNE multiplicative 
with threshold 0.8 as the "true" experimental AICR net- 
work [25]. 

"Real-world networks" tend to form high-density clus- 
ters and exhibit a clustering coefficient that is signifi- 
cantly higher than expected by random chance [14,18]. 
Therefore, we computed clustering coefficients of the 
AICR network and its corresponding random network 
(Figure 3). A clustering coefficient describes the degree 
of congregation among the nodes of a graph. The distri- 
bution of the average clustering coefficient in the AICR 
ranges between 0.4-0.8 and the frequency of the number 
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Nodes: 5, 147 
Edges :38,610 



3f 





Network 


Clustering 
coefficient 


Connected 
components 


Network 
diameter 


Network 
density 


Network 
centralization 


Average 
number of 
neighbors 


ARACNE 
multiplicative 
(0.8 
threshold) 


0.306 


759 


34 


0.001 


0.022 




7.501 



Node 



Edge 



► PPIN-1 node 



PPIN-1 node and effector target 



Figure 1 Construction of Arabidopsis immune co-expression regulatory (AICR) network. Network is displayed in Prefuse Force Directed 
Layout algorithm in Cytoscape. A node (blue circle) represents genes and a grey edge linking two nodes indicates the co-expression relationship 
between these two nodes based on Pearson Correlation Coefficient (PCC). Red nodes represent the proteins present in Plant-pathogen Interaction 
network (PPIN-1) [5]. The common feature of experimental the AICR network is listed on the bottom. 



of neighbors with higher clustering coefficient in the 
AICR is significantly greater than in a random network 
(Figure 3). To understand how the differences in the 
number of connections impact the topology of our co- 
expression network, we measured the shortest path (short- 
est distance between all pairs of nodes) and the closeness 
centrality (the inverse sum of shortest distances to all other 
nodes from a focal node) (Additional file 4: Figure SI and 
Additional file 5: Figure S2) [39]. The path lengths for 
the majority of the nodes in the AICR are significantly 
shorter than those of a random network (Additional file 4: 
Figure SI). In addition, the number of neighbors with 
significantly higher closeness centrality (threshold 0.2) 
is higher in the AICR network compared with random 
network (Additional file 5: Figure S2). In biological net- 
works, highly connected nodes (hubs) and nodes cen- 
tral to the network (betweenness) are thought to play 
significant regulatory roles on their adjacent nodes. 
Thus, we computed several topological properties in- 
cluding shared neighbor distribution (number of inter- 
action partners shared between two nodes) (Additional 
file 6: Figure S3), neighborhood connectivity distribu- 
tion (average connectivity of all neighbors) (Additional 
file 7: Figure S4), topological coefficients (the tendency 



of the nodes in the network to have shared neighbors) 
(Additional file 8: Figure S5) and betweenness centrality 
(nodes' centrality in a network) (Figure 4) [37,38]. These 
data suggest that the AICR network is not randomly orga- 
nized and shares properties of several previously described 
biological networks [5,12-15]. Another important character- 
istic of scale-free networks is the presence of a main com- 
ponent. Connected component analyses discovered that the 




Degree 

Figure 2 Degree distribution of the AICR. Frequency of the 
degree of the AICR (blue circles) and random (red diamonds) networks 
are indicated in log scale. Presence of highly connected nodes (hubs) 
can be observed in the AICR network. 



Tully et al. BMC Genomics 2014, 15:421 
http://www.biomedcentral.com/1471-2164/15/421 



Page 5 of 14 



C 



0.8 



</> <D 0.6 

— o 

° it 

2 o 

< 



0.4 



0.2 



• AICR 

♦ Random 



.... «v 
.•v ♦ 



10° 10 1 10 2 

Number of neighbors 

Figure 3 Average clustering coefficient of the AICR. A p ot 

between number of neighbors (X-axis) and average clustering coefficient 
(Y-axis) is drawn representing the AICR (blue circles) and random 
(red diamonds) networks. 



AICR network comprises 63 disconnected components, 
each containing at least six members. The main com- 
ponent in the AICR network has 2379 nodes and 33214 
(86%) edges. The second largest component contains 
only 79 nodes and 310 edges. This qualitative global 
topology of the AICR network is similar to previously 
known biological networks in Arabidopsis and other eu- 
karyotes [14,40]. In addition, network biology analyses re- 
vealed that degree distribution of main component of the 
AICR follows a power law compared to the main compo- 
nent of a random network with similar size (Additional 
file 9: Figure S6 and Additional file 10: Figure S7). 

Biological implications of topological properties of the 
AICR network 

Topological properties of the network can also be uti- 
lized to prioritize genes for further functional analysis. 
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Figure 4 Betweenness centrality of the AICR. Frequency of 
Betweenness centrality of the AICR (blue circles) and random (red 
diamonds) networks are indicated. 



Highly connected and highly centered nodes are likely to 
play crucial roles in maintaining network integrity and 
controlling information flow throughout the plant immune 
system. Specifically, we identified 190 highly connected 
nodes (Hub 50 encompassing at least 50 connections in the 
AICR). To characterize the significance of these highly 
connected proteins, we combined Hub of the AICR with 
biophysical interactions from plant-pathogen immune 
network, version 1' (PPIN-1; Figure 1) [5]. The PPIN-1 
was constructed using effectors (virulence factors) from 
two pathogens, Arabidopsis immune proteins, and -8,000 
other Arabidopsis proteins covering almost one-third of 
the genome [5]. Pathogens utilize effector molecules to re- 
wire the host network in a manner conducive to pathogen 
proliferation and dispersal [41]. PPIN-1 discovered 165 ef- 
fector interacting proteins (effector targets) and it was also 
demonstrated that these effector targets are highly inter- 
connected host proteins [5]. Despite the unavailability of 
effector targets from the entire Arabidopsis genome in 
PPIN-1, we found that the AICR network contains 51 ef- 
fector targets and three proteins (At3g63210, At4gll890 
and At3g07780) are Hub 50 in the AICR network (Figure 1). 
Given that transcriptional regulation does not necessarily 
coincide with biophysical interactions, this overlap is po- 
tentially significant and biologically meaningful. This also 
suggests that pathogen effectors target key regulatory pro- 
teins in both transcriptional as well as protein-protein in- 
teractions networks. 

In addition, we selected ninety nodes ("top 90") with the 
highest betweeness centrality (Figure 4). Among them, we 
found a number of previously characterized immune regu- 
lators or defense-related proteins, encompassing various 
levels of signal transduction flow: resistance proteins (RPP4, 
uncharacterized TIR-NBS-LRR) [42,43], kinases mediating 
early signaling events (MAPKKK13, WAK) [44], transcrip- 
tional executors (ANAC072, WRKY54, MYB66, WER1) 
[45-47], secretory proteins assisting with folding and 
modification of newly synthesized proteins (TRAP, CNX3, 
Secl4p, cyclophilin, UDP-Glycosyltransferase, DnaJ family, 
KMS1, GlcNAclpUT2) [48,49], enzymes involved in 
production of phytohormones and antimicrobial com- 
pounds (AOC2, ILL3, PAL2) and pathogenesis-related 
proteins (PR1) [50,51], hormonal response modulators 
(PYL6, IAA28) [52,53] and finally, components of ubiquitin- 
mediated protein degradation (UBQ4, an F-box protein 
that interacts with SKP1) [54,55]. Identification of these 
key immune regulators serves as a proof of concept for 
our analytical approach and justifies further analyses on 
suite of "top 90" proteins selected among thousands of 
other proteins in the AICR network. 

Network clustering and module annotations 

Another essential feature of a majority of biological net- 
works is their ability to naturally organize into modules 
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[14,17,18]. Discovery of such functional modules within 
biological networks is imperative for understanding prin- 
ciples of cellular organization and functions. To identify 
functional modules (subnetworks) that are comprised of 
highly interconnected sets of genes within the AICR 
network, we employed a fast agglomerative algorithm 
(FAG-EC) [56]. This new, low time complexity algorithm 
operates based on a local variable, edge clustering coeffi- 
cients, making it ideal to partition a relatively large net- 
work [37,56]. Overall, we identified 156 immune-related 
functional modules containing at least six members with 
the largest module encompassing 178 nodes (Additional 
file 11: Table S3). Furthermore, we subjected the AICR 
network to a module size distribution analysis. We com- 
puted the frequency and module size and demonstrated 
that the distribution of the module size follows a power 
law distribution with an exponential truncation (Additional 
file 12: Figure S8), which is common for several previously 
described "real-world" networks [14,18]. The module 
size distribution property further indicates that the 
AICR network possesses a modular structure. 

We also investigated the enrichment of Gene Ontology 
(GO) terms [57,58] in the ten largest immune-related mod- 
ules ("top 10") with sizes ranging from 38 to 178 distinct 



nodes using the Database for Annotation, Visualization 
and Integrated Discovery (DAVID) v6.7 [59] (Additional 
file 13: Table S4). The GO enrichment terms are re- 
markably informative in examining significant biological 
functions among co-expressed genes in large-scale net- 
works [58,59]. Several gene ontology (GO) categories 
were enriched (p-values >0.01) among the genes in the 
AICR network (Table 2). These major GO categories in- 
clude: immune-related, plastid, reproductive processes, 
nucleotide binding/kinases, intrinsic to membrane, metal- 
binding, mitochondrion and non-membrane-bounded 
organelle. We utilized these GO enrichments in com- 
bination with significantly enriched ds-regulatory elements 
to define the potential functions of the immune-related 
modules (described below). 

Development of OCCEAN software and identification of 
c/s-regulatory elements 

We sought to investigate whether all or a subset of the 
co-expressed genes within a given immune-related mod- 
ule are also co-regulated (i.e., are direct targets of a com- 
mon transcription factor). Given that transcriptional 
regulatory networks are highly complex and that func- 
tional modules may display crosstalk among themselves, 



Table 2 Significantly enriched GO terms in the ten largest immune-related modules with sizes ranging from 38 to 
1 78 genes 



Module 


Go term 




Count 


% 


P value 


Fold enrichment 


1 


GO:0000166~ 


nucleotide binding 


46 


25.84 


6.946003862092387E-5 


1.732 


1 


GQ0006796 ~ 


phosphate metabolic process 


31 


17.42 


5.723269851412882E-7 


2.705 


1 


GQ0004672 ~ 


protein kinase activity 


29 


16.29 


2.6741 028021 51 0666E-8 


3.278 


1 


GO:0008219~ 


cell death 


14 


7.87 


4.2878037087740953E-7 


6.123 


2 


GO:0009536 ~ 


plastid 


31 


30.69 


9.044388056227762E-4 


1.748 


2 


GQ0004672 ~ 


protein kinase activity 


12 


11.88 


0.007513 


2.463 


2 


GO:00 19748- 


secondary metabolic process 


8 


7.92 


0.003083 


4.073 


3 


GO:0009536 ~ 


plastid 


36 


49.32 


3.047542263866877E-10 


2.718 


3 


GO:0055114~ 


oxidation reduction 


11 


15.07 


0.002191 


3.019 


4 


GO:0005886 ~ 


plasma membrane 


16 


22.22 


0.001229 


2.356 


4 


GQ0006793 ~ 


phosphorus metabolic process 


14 


19.44 


9.7851 89944953485E-5 


3.418 


5 


GQ0009628 ~ 


response to abiotic stimulus 


9 


15.52 


0.009708 


2.845 


5 


GO:0005618~ 


cell wall 


5 


8.62 


0.056662 


3.316 


6 


GQ0009507 ~ 


chloroplast 


22 


44.00 


1.075282591 760991 6E-5 


2.499 


6 


GO:0051186~ 


cofactor metabolic process 


7 


14.00 


2.87600791 99003342E-5 


10.975 


7 


GO:0000166~ 


nucleotide binding 


12 


25.53 


0.0441 34 


1.777 


8 


GO:0009725 ~ 


response to hormone stimulus 


8 


17.39 


0.001339 


4.400 


8 


GQ0006793 ~ 


phosphorus metabolic process 


6 


13.04 


0.093212 


2.354 


9 


GO:0031224~ 


intrinsic to membrane 


13 


33.33 


0.002493 


2.407 


9 


GQ0006793 ~ 


phosphorus metabolic process 


8 


20.51 


0.008887 


3.139 


10 


GO:0032555 ~ 


purine ribonucleotide binding 


10 


26.32 


0.083444 


1.765 


10 


GO:0005886 ~ 


plasma membrane 


8 


21.05 


0.052468 


2.151 
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we can also expect that the same transcription factor can 
regulate co-expressed genes in multiple immune-related 
modules. Whereas experimentally verified plant ds-regula- 
tory elements can be retrieved from PLACE (A Database of 
Plant Qs-acting Regulatory DNA Elements) [60], Athena 
(Arabidopsis thaliana expression network analysis) [61] and 
AGRIS (The Arabidopsis Gene Regulatory Information 
Server) [11,62], only ~120 as-regulatory elements are cur- 
rently known in Arabidopsis, and a limited set of -30 
immune-related tis-elements have been described. In 
addition, the prediction of cw-elements from classical 
approaches is typically driven by a single experiment or 
dataset. Moreover, the currently available online motif 
discovery software has limited capacity to process a 
small number of gene entries. For example, frequently 
used software MEME (Multiple EM for Motif Elicitation) 
can only process up to 60,000 characters in a single run 
{e.g., 60 promoters, each of 1,000 bp in length) [63,64]. 
Another common software used to predict ds-regulatory 
elements, AthaMap (Arabidopsis thaliana Map) can only 
manage up to 200 gene entries [65]. These limitations can 
substantially decrease the chance of identifying novel cis- 
regulatory elements. Thus, we first aimed to develop 
OCCEAN (One-Click Cis-regulatory £lement A/Valysis) 
software that can process the promoter gene sequences at 
the entire genome scale (over 30 million characters) in a 
single run. OCCEAN identifies the statistically enriched/ 
depleted cw-regulatory elements (Figure 5) and integrates 
data from (i) our newly developed BLASTN-based pro- 
gram to identify short sequences and {ii) an improved 
version of the bootstrapping tool POBO (a promoter 
bootstrapping program) [64]. This user-friendly software 
requires the sequences in FASTA format as input, pro- 
cesses genome-scale level sequences, identifies common 



sequences in a given set of promoters, performs boot- 
strapping, and provides statistically enriched cis-regulatory 
elements in the gene's promoter as the output. OCCEAN 
is freely available online at http://occean.cis.uab.edu:8080/ 
occean/. We employed OCCEAN to individually process 
the sequences of immune-related genes' promoters for 
our "top 10" immune-related modules. The promoter 
sequences of the genome (33,323 genes) were used as 
background to compute the fold enrichments of putative 
cis-regulatory elements. We also analyzed the occurrences 
of all the putative cw-regulatory elements (cluster mean), 
performed 1,000 x bootstrapping to determine the back- 
ground mean and finally calculated fold enrichment ratios 
for all six-mer sequences. Three different fold enrichment 
ratios (>4, > 3 and > 2) prioritized newly identified cis- 
regulatory elements in "top 10" immune-related modules 
(Additional file 14: Table S5). To analyze the performance 
and robustness of our newly developed software, we 
compared the efficacy of OCCEAN with MEME. We 
generated a list of experimentally known ds-regulatory el- 
ements and computed the precision of both OCCEAN 
and MEME using cw-regulatory elements identified in our 
top 10 immune-related modules. MEME was unable to 
compute the analyses on top 4 immune-related modules 
as it can't process over 60,000 characters simultaneously. 
We computed true positives (nTPs), false positives (nFPs) 
and precision for MEME as well as OCCEAN enrichment 
ratios > 4, > 3 and > 2 (Figure 6). We selected an enrich- 
ment ratio > 3 as the optimal value for OCCEAN as it 
yields a higher number of nTPs, significantly fewer nFPs 
positives and greater precisions compared with enrich- 
ment ratios > 4 and > 2. In summary, we demonstrated 
that OCCEAN boasts higher capacity (can process the en- 
tire genome scale; over 30 million characters in a single run) 
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Figure 6 Performance and accuracy of OCCEAN. True positives 
(nTPs), false positives (nFPs) and precision for MEME as well as OCCEAN 
with multiple enrichment ratios are shown. 



and features higher precision than MEME (Figure 6). 
Identification of cis-regulatory elements among co-expressed 
genes could provide additional information for predic- 
tion of biological function. 

Inferring biological relevance of immune-related modules 

To fend off potential pathogens, plants employ two 
major types of immune responses: MAMPs-Triggered 
Immunity (MTI) and Effector Triggered Immunity (ETI) 
[4]. ETI is essentially a high amplitude re-booting of the 
immediate but weaker MTI. ETI results in robust dis- 
ease resistance responses, including localized host cell 
death (hypersensitive response; HR) and systemic signal- 
ing [4]. In addition, upon pathogen recognition at the 
MTI and/or ETI levels, the plant cell undergoes extensive 



transcriptional changes involving complex reiterative sig- 
naling networks and cross-talk controlled by phytohor- 
mones [66]. This leads to the metabolic reprogramming 
and production of an array of antimicrobial compounds 
[27]. Here we highlight two major immune-related func- 
tional modules, the largest identified Module 1 (Figure 7, 
Additional file 11: Table S3 and Additional file 13: Table 
S4) and a more compact Module 8 (Figure 7, Additional 
file 15; Supporting results, Additional file 11: Table S3 and 
Additional file 13: Table S4) as two most representative 
gene clusters reflecting the broad array of diverse patho- 
gens and immune stimuli used in the microarray experi- 
ments employed in our analyses. Module 1 is the largest 
module identified by our clustering analyses and com- 
prises 178 nodes, 15,720 edges, clustering coefficient of 
0.692 and network density of 0.499. The genes of Module 
1 are enriched in three major GO categories: (1) kinases/ 
ribonucleotide binding, (2) immune responses/programmed 
cell death and (3) transmembrane proteins. 

Diverse kinases account for the major functional cat- 
egory of Module 1, encompassing 46 out of 178 proteins 
representing receptor-like kinases (RLKs), mitogen-activated 
protein kinases (MAPKs), calcium-dependent protein 
kinases (CDPKs), wall-associated kinases (WAKs), etc. 
Typically, RLKs perceive MAMPs such as flg22, elfl8, 
chitin and OGs and trigger MTI [27]. Subsequently, a 
MAPK cascade is activated and downstream signaling 
ensues including phytohormonal crosstalk [8,27,44,67]. 
These events result in the activation of a wide-spectrum 
of immune responses, such as induced biosynthesis of 
phytoalexins and other antimicrobial secondary metabo- 
lites such as glucosinolates, calcium influx, as well as pro- 
duction and accumulation of reactive oxygen species 
(ROS) [27,68]. 

At the second, more powerful layer of the immune re- 
sponse, plants usually deploy NLR receptors (nucleotide- 
binding domain and leucine-rich repeat receptors; a 
major class of R proteins). The NLR receptors directly 
recognize specific effectors or indirectly detect effector 
activities and trigger ETI [1,2,6,7]. We discovered 8 NLR 
genes in the kinase sub-cluster of Module 1. These in- 
clude proteins conferring resistance to diverse pathogens 
such as Pseudomonas syringae, Albugo Candida, Hyalo- 
peronospora arabidopsidis and various Fusarium races. 

The second most abundant GO category in Module 1 
was immune-related genes (count: 32). This functional 
sub-cluster contains various defense-related transcrip- 
tion factors such as WRKY15, WRKY33, MYB113 and 
ANAC061 [9,47], four members of the RING-H2 finger 
protein family, a heat shock transcription factor HSF A-4a 
[53], an ethylene-responsive transcription factor ERF105 
and, intriguingly, three scarecrow-like GRAS family tran- 
scription factor genes, known to be primarily implicated in 
plant root development [69-71]. Collectively, transcription 
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factor genes account for over 40% of this sub-cluster's 
gene count. In addition, we identified a small group of 
genes strongly associated with response to fungal cell 
wall elicitor chitin, such as two auxin-regulated SAUR 
genes, one lectin-like, Enhanced Disease Susceptibility 1 
(EDS1) and senescence-associated protein SAG102 [72-76]. 

Another enriched GO category in Module 1 was 
membrane-associated proteins including both plasma 
membrane- and organellar membrane-resident peptides; 
not surprisingly, the largest class of these membrane 
proteins is transporters. In plants, uptake and translocation 
of nutrients play essential roles in physiological processes 
including plant growth, nutrition, signal transduction, and 
development [77-79]. Transport processes are also critical 
for the reallocation of resources and it was previously 
shown that defense signals cause realignment of transport 
activities to redirect resources toward immune responses 
and protect tissues of high value [80]. Of 31 genes assigned 
to this sub-cluster, seven were transport proteins (22.6% of 
this sub-cluster's gene count). The prevailing class was 
sugar (galactose and sucrose) transporters, in agreement 
with the fact that this class of proteins constitutes a key 
component for carbon partitioning at the whole plant 
level and is involved in both symbiotic and parasitic 
plant-fungi interactions [81]. Additionally, we identified 



CNI1 (Carbon-Nitrogen Insensitive 1) that is a key regu- 
lator of the Carbon/Nitrogen response for growth 
phase transition in Arabidopsis seedlings [82], as well 
as ATP Binding Cassette (ABC) and inorganic phos- 
phate transporters that were previously postulated to 
be required for organ growth, nutrition, development, 
and stress responses [83]. Last but not least, we also 
identified two other well-known immune regulators: 
Syntaxin 121/PEN1 and ML02 genes, required for re- 
sistance against barley powdery mildew, Blumeria graminis 
sp. hordei and a fungal pathogen Colletotrichum higginsia- 
num [84,85]. 

Another essential aspect of large-scale transcriptomic 
analyses is to link co-expression with co-regulation, i.e., to 
determine presence of common as-regulatory elements 
among genes of the same module. Several known cis- 
regulatory elements were discovered in the promoters 
of Module 1 genes (Additional file 14: Table S5). Two of 
the enriched elements, TCATGG and CATGGA, overlap 
with the octadecameric CArG box sequence 5'-CTTACC 
TTTCATGGATTA-3 ', identified in the APETALA3 pro- 
moter [86]. APETALA3, a class B homeotic organ iden- 
tity gene, was originally discovered as the central 
regulator of petal development in Arabidopsis flowers, 
but later also shown to be involved in control of the 
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floral meristem proliferation, regulation of flowering time, 
and other plant reproductive processes [87-89]. Inhibition 
of reproductive processes is an expected outcome of an im- 
mune stimulation, given the recent report describing a tran- 
sition from growth to defense following immune stimuli 
treatments in Arabidopsis [53]. Involvement of a develop- 
mental regulator in defense responses is not unprecedented 
and was previously shown for a MYB-related gene ASYM- 
METRIC LEAVES 1 (AS 1) [90]. 

Jasmonic Acid (JA) and Ethylene (ET) are two plant 
hormones antagonistic to SA [91] and down-regulation 
of major JA/ET signaling proteins will promote SA signal- 
ing by suppressing the JA/ET pathway [67]. Consistent 
with this observation, we identified an enriched element, 
ATCTTG that resembles the binding site for Ethylene- 
Insensitive 3 (EIN3) and three EIN3-LIKE (EIL) proteins 
[92] as well as two hexamers GTCGTC and CGTCGT, 
which overlap with the core binding site of JASE2 motif 
in two JA biosynthetic genes OPR1/OPR2 [93]. Recently, 
EIN3 and EIL1 were shown to negatively regulate MAMP- 
triggered immunity via a direct binding and down- 
regulation of the SA biosynthetic gene SID2 [94] . Given 
that OPRs are implicated in JA biosynthesis, we expect 
a fine-tuned interplay between SA and JA/ET signaling 
pathways [95]. In addition, several novel ds-regulatory 
elements were discovered in our study (Additional file 14: 
Table S5), and the identity of the cognate transcription 
factors and their functional relevance will be subject to 
future studies. Very recently, it was demonstrated that 
numerous transcription factors can recognize secondary 
elements, which in some cases are completely sequence- 
unrelated to the primary element [96]. In conjunction with 
this study, our analysis of Module 1 shows that ARR14 
(a MYB family transcription factor) not only recognizes 
the known primary ds-regulatory element AGATA/ 
TCG but also can specifically bind to a previously un- 
known as-regulatory element A GATCT. Thus, it's likely 
that transcription factor(s) have extended list of binding 
specificities beyond the currently known c/s-regulatory 
elements. We propose that the above described tran- 
scription factors and additional unknown regulatory 
proteins coordinate the gene regulation in this module. 

Our data provide further insights into the transcrip- 
tional regulatory mechanisms repressing additional puta- 
tive negative regulators of plant defense upon treatments 
with pathogen-mimicking stimuli. 

Conclusions 

In this study, we used a systems-level network biology 
approach to construct genome-wide Arabidopsis im- 
mune co-expression network and demonstrated that this 
network shares properties of a 'real-world network'. 
Topological properties-based partitioning allowed us to 
unravel 156 distinct immune-related functional modules. 



We demonstrated that nodes in the same module are not 
only highly correlated at the expression level but also 
densely connected to each other. Detailed analyses of two 
key immune-related modules provided a systems-level un- 
derstanding of how plant cells coordinate distinct immune 
signals to orchestrate fine-tuned and pathogen-specific 
defense responses. Our novel approach to discover cis- 
regulatory elements using OCCEAN is an effective method 
of solving the issue of finding novel motifs within a se- 
quence set. OCCEAN has advantages over other programs 
of the same purpose, such as APPLES (Analysis of Plant 
Promoter-Linked Elements) and MEME [63,97]. APPLES 
requires finding organisms of a specified relational distance 
for comparison, which can burden the user, and MEME 
has the statistical risk of discarding actual existing motifs. 
These problems are avoided in our solution whilst continu- 
ing to maintain a fair amount of focus on client-side 
simplicity. In addition, OCCEAN has the capacity to be 
expanded for analyses of other eukaryotes genomes, 
such as fly and human. Our systems -level approach to 
examine ds-regulatory elements (the putative transcrip- 
tion factor binding sites) in the promoters of the co- 
expressed genes made it possible to link co-expression 
to co-regulation of genes in the same module. 

Methods 

Data download, selection criteria for differentially expressed 
(DE) genes dataset and promoter sequence acquisition 

We utilized available transcriptomic data of transcrip- 
tional responses extracted from 271 microarray experi- 
ments representing nine major immune-related studies 
(Additional file 1: Table SI and Additional file 2: Support- 
ing methods) [26-34]. Priority was given to well-referenced 
studies, employing the Affymetrix ATH1 GeneChip array, 
encompassing the broadest possible spectrum of plant 
defense responses upon pathogen infections or pathogen- 
mimicking stimuli treatment (Additional file 2: Supporting 
methods). Lists of probes showing significant up- or down- 
regulation in each experimental condition were compiled, 
using criteria for significance employed in the respective 
original study (Additional file 3: Table S2). This led us 
to the identification of a list 8,377 genes differentially 
expressed between all treatments [5]. For each of these 
genes, we acquired 1000 bp upstream of the transcrip- 
tional start site from TAIR Version 10 at www.arabi- 
dopsis.org [98]. These upstream regions were searched 
for putative transcription factor binding sites. 

Network construction, topological properties, network 
clustering and Gene Ontology (GO) 

The microarray data presented in Additional file 3: Table S2 
was used to construct a gene co-expression network 
using both linear and non-linear models. In the linear model, 
Pearson Correlation Coefficients (PCC) was measured 
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based on the mutual k-nearest neighbor method of 
Ruan et al. [18,97] with some modifications. In contrast 
to Ruan et al. [18,97], the k-list for each gene included 
k-1 other genes, and at least one (and possibly more) 
gene sharing the smallest of the k PCCs. For example, 
for a k of 10, the k-list might include 9 genes in order 
of decreasing PCC and one or more genes having the 
next lower PCC value. This was done in consideration 
of the fact that all genes of equal PCC are equally valid 
as connected nodes. Random networks were con- 
structed by randomly assigning gene expression values 
(0 = no change, +1 = up-regulated, -1 = down-regulated) 
to the genes from the immune transcriptional profiling 
experiments. The network based on PCC-threshold was 
inferred by first calculating the PCC for every gene- 
gene pair in the dataset yielding 8,377 x 8,377 PCC 
values. These values were tested against 0.9 threshold 
and only the PCC values that were greater than or 
equal to 0.9 were selected to indicate that there is an 
edge between this gene pairs. Other values were dis- 
carded. R was used to calculate the non-linear mutual 
information (MI) [99]. First the MI was calculated using 
parmigene package were the MI for every gene pair was 
calculated resulting in 8377 x 8377 MI values [25]. Then 
these values where used by the ARACNE multiplicative 
(Algorithm for the Reconstruction of Accurate Cellular 
Networks), ARACNE additive, CLR (Context Likelihood 
of Relatedness), and MRNET (Minimum Redundancy 
NET work) algorithms, using the same package, to infer a 
weighted adjacency matrix [21-25]. 80% and 90% of the 
maximum MI values in each weight adjacency matrix 
were chosen as the threshold to determine the existence 
of an edge between each pair of genes. If the value is 
greater than or equal to the threshold, this indicates an 
edge between a given pair of genes. 

A modified and parallelized Cytoscape version 2.8.4 
[37,38] and the Clusterviz version 1.2 (FAG-EC algorithm) 
[37,38,56,100] and ClusterMaker version 1.10 [100,101] 
plugins were used to visualize this network, calculate its 
parameters and network topological properties. Network 
partitioning was performed using FAG-EC algorithm 
[56,100] and the Markov Clustering Algorithm [102]. For 
each gene in the network sub-clusters, a Gene Ontology 
[57,58] molecular function and biological process was 
assigned from the Database for Annotation, Visualization 
and Integrated Discovery (DAVID) v6.7 [59]. 

BLAST-based generation of common ds-regulatory el- 
ements Sequences obtained from TAIR [98] (1000 bp 5' 
of the transcription start site) were incorporated into a 
local BLAST [103,104] database using the formatdb util- 
ity that is included in NCBI's package download of 
BLAST. Each sequence in the local BLAST database was 
used in turn as a query sequence to find subsequences 
shared in the promoter regions of a number of genes. 



The resulting common subsequences were imported into 
a relational database. Similar sequences with at least 
75% identity and an E-value<lx 10~ were scanned 
using a sliding window search in order to extract all 
contained 6-mers. All 6-mers having no more than four 
consecutive single nucleotide repeats were deemed puta- 
tive transcription factor binding sites and retained for 
further computational analysis. This process has been in- 
corporated into our web-accessible tool, OCCEAN. 

Statistical enrichment of c/s-regulatory elements using 
modified version of POBO 

Putative transcription factor binding sites results from 
BLAST were analyzed for statistically significant over 
representation/under representation using the POBO 
(a promoter bootstrapping program) tool [64]. POBO 
is an exceptional tool for statistical sequence signifi- 
cance analysis, but is inhibited by the constraint that it 
can perform analysis only on one c/s-regulatory element at 
a time. As our sequence sets were high in number, manual 
entry would be infeasible. Therefore, a wrapper program 
was written to take each individual sequence, create a 
cluster file consisting of the set of promoter identifiers/ 
sequences from the original experimentally found gene 
promoter dataset that contain the sequence, and run an 
instance of POBO with the sequence and cluster file as in- 
put. The program then extracted the desired information 
from the resulting HTML files from all of the individual 
runs and put them in a single spreadsheet for analysis. 
Running POBO locally required downloading the source 
code from http://ekhidna.biocenter.helsinki.fi/poxo/pobo/ 
and setting up a local MySQL database with all known 
Arabidopsis gene promoters corresponding to -33,000 
genes as the background data. 1 kb promoter sequence for 
all ~33,000 genes were obtained from the TAIR10 genome 
release, and used as background for the POBO analysis. 
The POBO results were converted to spreadsheet format 
for further analysis. True positives (nTPs), false positives 
(nFPs) and precision for MEME as well as OCCEAN were 
calculated as described in [25]. 

Development of One-Click Cis-Element ANalysis (OCCEAN) 
software for genome-wide identification of c/s-regulatory 
elements 

The bulk of OCCEAN on the server-side was developed 
in the Java language (version 7) using Apache Tomcat 
(version 7, as well) on a Linux server. A Java Servlet inter- 
face was used to communicate with OCCEAN's web inter- 
face. OCCEAN automates and integrates the information 
obtained from BLAST and POBO analysis. This user 
friendly software requires promoter sequences in FASTA 
format. Background sequences of the species under study 
are imbedded in the software. OCCEAN is time-efficient 
and will return a link to the file containing the results the 
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BLAST and POBO. OCCEAN's web interface was devel- 
oped using HTML, CSS, Javascript, and AJAX for in-page 
update notifications. 

Additional files 



Abbreviations 

mKNN: mutual k-nearest neighbor; GO: Gene ontology; OCCEAN: One click cis- 
regulatory elements ANalysis; MAMPs: Microbial-associated molecular patterns; 
MTI: MAMPs-triggered immunity; ETI: Effector-triggered immunity; SA: Salicylic 
acid; JA: Jasmonic acid; ET: Ethylene; TFs: Transcription factors; PCC: Pearson 
correlation coefficient; FAG-EC: Fast agglomerate algorithm- edge clustering; 
MEME: Multiple EM for motif elicitation; AthaMap: Arabidopsis thaliana map; 
PLACE: A database of plant cis-acting regulatory DNA elements; AGRIS: The 
arabidopsis gene regulatory information server; POBO: A promoter 
bootstrapping program; APPLES: Analysis of plant promoter-linked elements, 
nTPs, True positives; nFPs: False positives; ARACNE: Algorithm for the 
reconstruction of accurate cellular networks); CLR: Context likelihood of 
relatedness; MRNET: Minimum redundancy NETwork AICR: Arabidopsis immune 
co-expression regulatory. 

Competing interests 

The authors declare that they have no competing interests. 
Authors' contributions 

MSM and AEH designed the experiments. JPT performed data mining, designed 
and developed OCCEAN and computed cross-comparison as-regulatory elements 
experiments. JPT and RW performed c/s-regulatory element enrichment 
experiments. HMRA performed PCC analyses, constructed immune 



co-expression network, computed network topological properties and performed 
GO annotation analyses. MSM, AEH and AS supervised HMRA, JPT and RW. 
MSM and AEH wrote the manuscript and all authors edited the manuscript. 
All authors read and approved the final manuscript. 



Acknowledgements 

The authors are greatly thankful to Dr. Purushotham Bangalore and Mr. Larry 
Owen for technical support. The authors thank Drs. Karolina Mukhtar and 
Michelle Amaral, Ms. Cassandra Garbutt and Ms. Jessica Lopez for critical 
reading of the manuscript and valuable suggestions. 

Author details 

department of Computer & Information Sciences, University of Alabama at 
Birmingham, Birmingham, AL, 35294-1 170, USA. 2 Current Address: Samuel 
Ginn College of Engineering, 3101 Shelby Center, Auburn University, Auburn, 
AL, 36849-5347, USA. department of Biology, University of Alabama at 
Birmingham, Birmingham, AL, 35294-1 170, USA. 4 Nutrition Obesity Research 
Center, University of Alabama at Birmingham, Birmingham, AL, 35294, USA. 

Received: 13 August 2013 Accepted: 27 May 2014 
Published: 3 June 2014 



References 

1. Mukhtar MS: Engineering NLR immune receptors for broad-spectrum 
disease resistance. Trends Plant Sci 2013, 1 8(9):469-472. 

2. Pajerowska-Mukhtar KM, Emerine DK, Mukhtar MS: Tell me more: roles of 
NPRs in plant immunity. Trends Plant Sci 2013, 18(7):402-41 1. 

3. Wulff BB, Horvath DM, Ward ER: Improving immunity in crops: new tactics 
in an old game. Curr Opin Plant Biol 201 1, 1 4(4):468-476. 

4. Jones JD, Dangl JL: The plant immune system. Nature 2006, 444(71 1 7):323-329. 

5. Mukhtar MS, Carvunis AR, Dreze M, Epple P, Steinbrenner J, Moore J, Tasan 
M, Galli M, Hao T, Nishimura MT, Pevzner SJ, Donovan SE, Ghamsari L, 
Santhanam B, Romero V, Poulin MM, Gebreab F, Gutierrez BJ, Tarn S, 
Monachello D, Boxem M, Harbort CJ, McDonald N, Gai L, Chen H, He Y, 
Consortium EUF, Vandenhaute J, Roth FP, Hill DE, et al: Independently 
evolved virulence effectors converge onto hubs in a plant immune 
system network. Science 2011, 333(6042)596-601 . 

6. Nishimura MT, Dangl JL: Arabidopsis and the plant immune system. Plant 
J Cell Mole Biol 2010, 61 (6):1 053-1 066. 

7. Mukhtar MS, Nishimura MT, Dangl J: NPR1 in plant defense: it's not over 
'til it's turned over. Cell 2009, 137(5):804-806. 

8. Spoel SH, Dong X: How do plants achieve immunity? Defence without 
specialized immune cells. Nat Rev Immunol 2012, 12(2):89-100. 

9. Eulgem T, Somssich IE: Networks of WRKY transcription factors in defense 
signaling. Curr Opin Plant Biol 2007, 10(4):366-371. 

10. Guo A, He K, Liu D, Bai S, Gu X, Wei L, Luo J: DATF: a database of 
Arabidopsis transcription factors. Bioinformatics 2005, 21(10):2568-2569. 

1 1 . Yilmaz A, Mejia-Guerra MK, Kurz K, Liang X, Welch L, Grotewold E: AGRIS: 
the arabidopsis gene regulatory information server, an update. Nucleic 
Acids Res 201 1, 39(Database issue):Dl 1 18-D1 122. 

12. Arabidopsis Interactome Mapping C: Evidence for network evolution in an 
Arabidopsis interactome map. Science 201 1, 333(6042):601-607. 

13. Carrera J, Rodrigo G, Jaramillo A, Elena SF: Reverse-engineering the 
Arabidopsis thaliana transcriptional network under changing 
environmental conditions. Genome Biol 2009, 10(9):R96. 

14. Mao L, Van Hemert JL, Dash S, Dickerson JA: Arabidopsis gene co- 
expression network and its functional modules. BMC Bioinformatics 2009, 
10:346. 

15. Vidal M, Cusick ME, Barabasi AL: Interactome networks and human 
disease. Cell 201 1, 144(6)386-998. 

16. Ma S, Bohnert HJ: Integration of Arabidopsis thaliana stress-related 
transcript profiles, promoter structures, and cell-specific expression. 
Genome Biol 2007, 8(4):R49. 

1 7. Heyndrickx KS, Vandepoele K: Systematic identification of functional plant 
modules through the integration of complementary data sources. Plant 
Physiol 2012, 159(3):884-901. 

18. Ruan J, Perez J, Hernandez B, Lei C, Sunter G, Sponsel VM: Systematic 
identification of functional modules and cis-regulatory elements in 
Arabidopsis thaliana. BMC Bioinformatics 201 1 , 12(Suppl 12)52. 



Additional file 1: Table SI. Differentially expressed (DE) genes that 
were determined from defense-related experiments. 

Additional file 2: Supporting methods Transcriptomic data of 
transcriptional responses extracted from 271 microarray experiments 
representing nine major immune-related previous studies. 

Additional file 3: Table S2. AGIs of the up-regulated and down-regulated 
genes derived from Table SI studies. 

Additional file 4: Figure SI. Distribution of shortest paths in the AICR 
(blue circles) and random (red diamonds) networks. 

Additional file 5: Figure S2. Closeness centrality property of the AICR 
network. Distribution of closeness centrality in the AICR (blue circles) and 
random (red diamonds) networks. 

Additional file 6: Figure S3. Evaluation of frequency of number of shared 
neighbors in the AICR (blue circles) and random (red diamonds) networks. 

Additional file 7: Figure S4. Determination of neighborhood connectivity 
frequency in the AICR (blue circles) and random (red diamonds) networks. 

Additional file 8: Figure S5. Distribution of topological coefficients in 
the AICR (blue circles) and random (red diamonds) networks. 

Additional file 9: Figure S6. Degree distributions of main components 
in the AICR (blue circles) and random (red diamonds) networks. 

Additional file 10: Figure S7. Betweenness centrality of main 
components in the AICR (blue circles) and random (red diamonds) networks. 

Additional file 11: Table S3. Identification of 156 immune-related 
modules. Size of each module is indicated. 

Additional file 12: Figure S8. Distribution of module size in the AICR 
network. Frequency of module size in the AICR (blue circles) network is 
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